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Abstract 

(— > , 

^1 , We introduce the Directional Gradient- Curvature (DGC) method, a novel 

approach for filling gaps in gridded environmental data. DGC is based on 
an objective function that measures the distance between the directionally 
segregated normalized squared gradient and curvature energies of the sam- 



C<") ! pie and entire domain data. DGC employs data-conditioned simulations, 

which sample the local minima configuration space of the objective function 
instead of the full conditional probability density function. Anisotropy and 



^ ! non-stationarity can be captured by the local constraints and the direction- 

Sh ■ 

dependent global constraints. DGC is computationally efficient and requires 
minimal user input, making it suitable for automated processing of large 
(e.g., remotely sensed) spatial data sets. Various effects are investigated on 
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synthetic data. The gap-filling performance of DGC is assessed in compari- 
son with established classification and interpolation methods using synthetic 
and real satellite data, including a skewed distribution of daily column ozone 
values. It is shown that DGC is competitive in terms of cross validation 
performance. 

Keywords: correlation, anisotropy, spatial interpolation, stochastic 
estimation, optimization, simulation 



i 1. Introduction 



2 Atmospheric data, whether they are obtained by means of ground or 

3 remote sensing methods, often include data gaps. Such gaps arise due to 

4 different reasons, e.g. incomplete time series, spatial irregula rities of sam- 
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ample, remote sensing images may include obscured areas due to cloud 
cover, whereas gaps also appea r between sa t ellite paths where there is no 



9 coverage for a specific period ( jEmili et all 1201 if ). The impact of miss 



io ing d ata on the estimate of sta tistical averages and trends can be signifi- 



ii cant (Si ckles fc Shadwick 



20071 ). There is an interest in the development 
of new methods for filling gaps in atmospheric data and their comparison 
with existing imputation methods (IJunninen et all 120041 ) . Particularly for 
frequently collected, massive remotely sensed data, the efficient filling of the 
gaps is a challenging task . Traditiona l geost atistical interpolation meth- 



i6 ods such as kriging, e.g. ( iWackernagell . 120031 ) . can be impractical due to 



high computational complexity, restriction to Gaussian data, as well as var- 



2 



ib ious subjective choices in va riogram modeling and interpolation search ra- 



19 dius ( IDiggle &: Ribeirol . 120071 ). In particular, compu tationa 



20 ods a re needed for filling gaps in very large data sets (ICressie 



y efficient meth- 



2008 



20081 ). 
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In the following, we consider a set of sampling points G s = {si, i = 
1,...,N}, where Sj = (xi,yi) G M 2 . The points are scattered on a rect- 
angular grid G of size Nq = L x x L y , where L x and L y are respectively 
the horizontal and vertical dimensions of the rectangle (in terms of the unit 
length), such that Nq > N. Let G p = {s p ,p = 1, . . . , P} be the set of predic- 
tion points, representing locations of missing values, such that G = G s U G p . 
The data, Z(G S ) = {z;, Vsj 6 G s }, are considered as a realization of the con- 
tinuous random field Z(si). To reduce the dimensionality of the configuration 
space, we discretize the continuously valued field. For applications that do 
not require high resolution, e.g. environmental monitoring and risk manage- 
ment, 7i(G s ) can be discretized into a small (e.g., eight) number N c of levels 
(classes). Continuous distributions are obtained at the limit N c — > oo. In the 
current study, the spatial prediction of missing values is posed as a spatial 
classification problem for ranked numerical data. Continuous interpolation 
is approximated by considering an arbitrarily high number of levels. 

The discretization classes C q , q = 1, . . . , N c correspond to the intervals 
C q = (t q , t g+ i] for q = 2, . . . , N c - 1, d = (-oo, t 2 ], and C Nc = (t Nc , oo). The 
classes are defined with respect to threshold levels k = 2, . . . , N c . All the 
classes have a uniform width except C\ and Cn c which extend to negative 
and positive infinity respectively, to include values outside the observed in- 
terval [^min, z max \. More general class definitions can be investigated. The 
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43 class identity field I(s) takes integer values q = 1, . . . , N c that represent the 

44 respective class index. In particular, I (si) = q implies that e C q . The 

45 prediction problem is equivalent to assigning a class label at each point in 

46 G p . A map of the process Z can be generated consisting of equivalent-class 

47 (isolevel) contours. 



2. The Directional Gradient-Curvature Model 



The Directional Gradient- Curv ature (DGC) mode l is inspired by Spar- 
tan spatial random fields (SSRF) (IHristopulosl . 120031 ) . which are based on 
short-range interactions between the field values. The SSRF model is para- 
metric and represents stationary, continuous and isotropic Gaussian random 
fields. To relax these assumptions, we introduce an almost non-parametric 
approach that aims at matching short-range correlations in G s with those of 
G. This idea was recently successfully applied to spati al random fields still 
assuming spatial isotropy (jZukovic fc Hristopulod . 120121 ). The present model 
extends this approach by relaxing even the isotropic assumption through in- 
corporating anisotropic dependence. In particular, the correlations used in 
DGC represent the normalized squared gradient and curvature energies of 
the discretized class identity field along different directions. Let a n be the 
lattice step in direction e n . The local square gradient and curvature terms 
in each of the d directions used are given by 



G n (I; Si) 



[I (si + a n e n ) - I (si)]' 



n 



(1) 
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c n (i; sd = m + anSn) + /( T a " 4) " 2m] , n = l,...,d. (2) 

a n 

63 The normalized squared gradient, G n (I g ), and curvature, C n (I g ) energies 

64 in a direction e„, are defined as averages of the above over the grid G. I g = 

65 [I(si) ■ ■ ■ I(sn g )] Vsj G G, represents the values of the class identity field 
ee on the entire grid, I p = [/(si) . . . J(sp)] Vs*j G G p , is the class field at the 
67 prediction sites, and I s = [/(s*i) . . . /(sV)] Vs*j G G s are the class identity 
es values at the sampling sites. The matching of the gradient and curvature 
69 constraints on G s and G is based on the following objective functional: 



d 

U(I P \I S ) = [w l( p{G n (l g ),G n (I s )) +w 2 <p(C n (l g ),C n (I s ))} , (3) 

n=l 

f (l-x/x') 2 , x'^0 
<P(x,x')=\ K ' ' (4) 

I x 2 , x' = 0. 

70 In the above, w = (101,102) represent gradient and curvature weights 

71 (wi,W2 > 0, w\ + 102 = 1), and d is the number of the directions used. 

72 We use (jlj) to measure the deviation between the G s — and G— based val- 

73 ues instead of = (x — x') 2 , because the gradient and curvature can have 

74 very different magnitudes, depending on the units used; this means that one 

75 term may dominate in the optimization. By using normalized constraints we 

76 compensate for the possible disparity of magnitudes between gradient and 

77 curvature. If one of the sample quantities is zero, the second line of (jlj) is 
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78 used to avoid a singular denominator. 

79 We select w = (0.5, 0.5) and d — 4, representing four directions with the 
so following angles (with respect to the positive x-axis): 0°, 45°, 90°, and 135°. 
si Given the above, the classification problem is equivalent to determining the 

82 optimal configuration I p that corresponds to the minimum of 

l p = argmin£/(I p |I s ). (5) 

83 The optimization of fl3]) is conducted numerically. The choice of the initial 

84 configuration is important to obtain a reliable, fast and automatic algorithm: 
as it should prevent the optimization from getting trapped in poor local minima 
se and minimize the relaxation path in configuration space to the equilibrium, 
87 and it should also minimize the need for user intervention. The sampling 
as points retain their values I s . Assuming a certain degree of spatial continuity, 

89 common in geospatial data sets, the initial state of I p is determined based on 

90 the sample states in the immediate neighborhood of the individual prediction 

91 points. The neighborhood of s p is determined byanmxm stencil (m = 2/+1) 

92 centered at s p . Then, the initial value at a prediction point is assigned by 

93 majority rule, based on the prevailing value of its sample neighbors inside the 

94 stencil. The stencil size is chosen automatically, reflecting the local sampling 

95 density and the distribution of class identity values. Namely, it is adaptively 

96 set to the smallest size that contains a finite number of sampling points with 

97 a prevailing value. If no majority is reached up to some neighborhood size 

98 m max x m max , the initial value is assigned (i) randomly from the range of the 

99 labels with tie votes or (ii) from the entire range of labels 1, N c , if majority 

100 is not reached due to absence of sampling points within the maximum stencil. 



6 



101 The above method of initial state assignment will be referred to as majority 

102 rule with adaptable stencil size (MRASS). 

103 The updati ng of class identity states on G p us es the "greedy" Monte Carlo 



(MC) method fjPapadimitriou fe Steiglitzl . Il982l ). which unconditionally ac 



105 cepts a new state if the latter lowers the cost function. The greedy MC 

loe algorithm may cause the termination of the DGC algorithm at local minima 

107 of the objective function ([3]). Targeting exclusively global minima (e.g. by 

108 simulated annealing) unduly emphasizes exact matching of the energies on 
log the entire domain with those in the sample domain; however, the latter are 
no subject to sampling fluctuations and measurement errors. 

in The algorithm performs a random walk through the grid G p . It terminates 

n2 if P consecutive update trials do not produce a single successful update. If 

n3 the computational budget is a concern, the algorithm can terminate when a 

H4 pre-specified maximum number of Monte Carlo steps is exceeded. In either 

us case, the generated realization is accepted only if the residual value of the cost 

lie function is below a user-defined tolerance level tol. Otherwise, the realization 

H7 is rejected and a new one is generated. The algorithm generates M different 

us realizations. The median values from all the accepted realizations at each 

us missing-value point represent the prediction of the algorithm. Associated 

120 confidence intervals are also derived. 

121 The main steps of the procedure described above are summarized by 

122 means of the following algorithm. 

123 1. Define the number of realizations M, the number of classes N c , the 

124 maximum stencil size m max , the residual cost function tolerance tol 

125 and the maximum number of Monte Carlo steps i max (optional). 
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126 2. Discretize Z(G S ) to obtain the sample class identity field I s . 

127 3. Calculate the directional sample energies G n (I s ), C n (I s ), n — 1, . . . , <iQ 

128 4. Initialize the simulated realization index j = 1. 

129 5. while j < M repeat the following steps: 

130 (a) Assign initial values ip '' to the prediction points in G p based on 

131 MRASS. 

132 (b) Calculate the initial energy values G n (l^), C n (Ig^), n — 1, . . . , d, 

133 and the objective function = U(I P °^\I S ). 

134 (c) Initialize the simulated state index i = 0, and the rejected states 

135 index i r = 0. 

136 (d) while (i r < P) A {i < v ax ) repeat the following updating steps: 

137 i. Generate a new state Ip by randomly, i.e., with probability 
us 0.5, adding ±1 to the state I p , maintaining the condition 

1 < < N c , Vsj G G p . 

uo ii. Calculate G n (lg +1 ^), C n (Ig +1 ' ) ), n — 1, . . . , d. 

hi. Calculate U {i+l) = U(I p i+1) \I s ). 
H2 iv. If < U® accept the new state I p m) ; i r 0; 

143 else I p l+1) = 1®; = U®; i r ^i r + l; end. 

144 v. i — >■ i + 1; 

145 end while 

we (e) If U® < tol store the realization i*(j) = 1$; j = j + 1; 

147 else return to 5 (a); end. 

us end while 

1 The algorithm checks if the number of samples for calculating G n (I s ), C n (I s ) is 
sufficient for obtaining reliable estimates. 
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149 6. Evaluate the statistics from the realizations i p (j), j = 1, ...,M. 

150 The DGC method lies between interpolation and conditional simulation. 

151 Interpolation methods provide a single optimal configuration of the miss- 

152 ing values, e.g., kriging is based on the minimization of the mean square 

153 error. Conditional simulation based on Markov Chain Monte Carlo meth- 

154 ods aims to sample the entire configuration space and reconstruct the joint 

155 conditional probability density function of the missing data. DGC on the 
we other hand samples the configuration space that corresponds to local min- 

157 ima of the objective function. Since DGC returns multiple realizations, we 

158 can characterize it as a stochastic method. However, in DGC the sampling 

159 of the configuration space is restricted to the subspace of local minima. The 
wo afforded dimensionality reduction is responsible for the computational efli- 
i6i ciency of the method. 



162 3. DGC validation methodology 



163 In this section we conduct numerical experiments, in which a portion of 

164 the data is set aside to be used for validation of the classification/interpolation 

165 algorithms tested. The performance of DGC is evaluated by calculating the 



1 -*(/&),/&)) 



where I(s p ) is 



lee misclassification rate F* = l/P gG 

167 the true class identity value at the validation points, I(s p ) is the classifies 

168 tion estimate and 5(1, 1') = 1 if I = I', 5(1, 1') = if I ^ V. The gap-filling 

169 of DGC is compared with the fc-nearest neighbor (KNN) 



and fuzzy /c-nearest neighbor (FKNN) (IKeller et all Il985l ) classification al 



Dasarathy 



19911 ) 



171 gorithms. We chose the k values that minimize the cross validation errors 

172 to obtain the lowest achievable errors by KNN and FKNN. The KNN and 
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173 FKN N algorithms are applied using the Matlab® function fknn (lAkbasl . 



20071 ). 



The interp olation perform ance is compared with the inverse distance 
weighted (ID) (jShepardl . Il968l ). nearest neighbors (NN), bilinear (BL), bicu- 
bic (BC), and biharmonic spline (BS) (jSandwelll . 



19871 ) methods. For t 



lie 



178 Gaus sian synthetic data we also include the ordinary kriging (OK) method (IWackernagel . 



20031 ). Given the Gaussian distribution and knowledge of the covariance pa- 



rameters, OK provides optimal predictions and thus also a standard for com- 
paring DGC estimates. The NN, BL, BC and BS interpolation algorithms 
were implemented by means of the Matl ab® functio n griddata. For ID we 



183 used the Matlab® function f illnans (IHowat 



20071 ). Fi nally 



184 used the routines available in the Matlab® library vebyk (ISidler 



or OK we 



2009|) 



Let Z(s p ) be the estimate of the continuous field calculated from the back 
transformation 



Z(s P ) = [t iz{gp) + t iz{gp)+l ]/2, p = 1, . . . , P. 



(6) 



is? If Z(s p ) is the true value at s p the estimation error is e(s p ) = Z(s p ) — Z(s p ). 

188 For JV C » 1 we calculate the following prediction errors: average absolute 

189 error 

AAE=(1/P) J2 \<*p)1 ( 7 ) 

wo average relative error 



ARE = (1/P) <*v)/Z(s 

Sp^zGp 



(8) 
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191 average absolute relative error 



AARE = (1/P) \<s p )\/Z(s p ), 



(9) 



192 root average squared error 



RASE = 



(10) 




193 and linear correlation coefficient R. 



194 



If S sample configurations are considered, the mean values of the vari- 



es dation measures (i.e., the MAAE, MARE, MAARE, MRASE, and MR) are 

we calculated by averaging over the sample configurations. To focus on the lo- 

197 cal performance of DGC, we use the respective "local" errors, i.e., MAE, 

we MRE, MARE, and RMSE, in which the spatial average is replaced by the 

199 mean over predictions obtained from M different simulations. Furthermore, 

200 we record the optimization CPU time, T cpu , and the number of Monte Carlo 

201 steps (MCS). 

202 The computations are performed in Matlab® programming environment 

203 on a desktop computer with 3.25 GB RAM and an Intel® Core™ 2 Quad 

204 CPU Q9650 processor with an 3 GHz clock. 

205 4. Results 

206 Synthetic Data 

207 DGC performance is first studied on synthetic data sampled on regu- 

208 lar grids. The data are simulated from the Gaussian random field Z ~ 
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Table 1: Mean misclassification rate (F*) [%] and standard deviation Sf* for synthetic 
Gaussian data with anisotropic Matern covariance obtained by the DGC, KNN and FKNN 
algorithms. 



Levels 


N c = 8 


iV c = 16 


p[%] 


33 


66 


33 


66 


Model 


DGC KNN FKNN 


DGC KNN FKNN 


DGC KNN FKNN 


DGC KNN FKNN 


(F*) 
S p* 


18.9 29.1 27.5 
1.6 1.4 1.4 


26.9 35.6 34.8 
2.2 1.2 1.2 


26.4 51.4 51.4 
2.1 1.6 1.5 


38.6 58.1 57.9 
2.5 1.1 1.0 



N(m = 50, a = 10) with Whittle-Matern covariance given by cz(f) 



a 2 h v K u (h), where h = \Ai7Ci + r i/^2 an d r = {ri,r 2 ) is the lag dis- 
tance between two points. K v is the modified Bessel function of the second 
kind and of order v, where v = 2.5 is the covariance smoothness parameter. 
The principal axes of anisotropy are aligned with the coordinate axes. The 
correlation length in the vertical direction is set to £2 — 2 and in the hor- 
izontal direction to £1 = 4. The field is sampled on a square grid G, with 
Ng = 50 x 50 nodes using the spectral method ( iDrummond and Horgan . 



19871 ). Missing data samples Z{G S ) of size N = N G - [(p/100%)N G \ are 
generated from the complete sets by randomly removing P = [(p/100%) No\ 
values, which are used as validation points. For different degrees of thinning 
(typically p = 33% and 66%), we generate S = 100 different sampling config- 
urations. The predictions at the removed points are calculated and compared 
with the true values. 

The classification results for the synthetic data are summarized in Tabled] 
The misclassification rate obtained by DGC is considerably smaller than the 
KNN and FKNN rates in all cases, although DGC shows somewhat larger 
sample-to-sample fluctuations. The mean CPU time required by DGC ranges 
between 0.96 and 1.11 seconds and the mean number of Monte Carlo steps 
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228 between 10 4 and 5 x 10 4 . The DGC interpolation performance is evaluated in 

229 Table |2]using iV c = 1000 classes. In terms of validation errors (smallest errors 

230 and largest R), for the uniformly thinned data (p = 33%, 66%) OK ranked 

231 best. As mentioned above, for Gaussian data with known covariance pa- 

232 rameters OK is expected to give optimal predictions. The known directional 

233 correlation lengths also allowed identifying a region of influence around the 

234 prediction points, thus optimizing search neighborhoods and consequently 

235 the OK CPU time. Nevertheless, the OK CPU time was the highest. For 

236 p = 33% the DGC performance ranked second and for p = 66% it was com- 
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Figure 1: DGC interpolation results for synthetic Gaussian data with anisotropic Matern 
covariance based on M = 100 simulation runs on a single sample generated by 66% 
thinning. Sub-figures include (a) original field, (b) thinned sample, (c) interpolated data 
based on the median values from M runs, (d) comparison of the empirical cdfs of the 
original and interpolated data, (e) spatial distribution of the 95% confidence interval (c.i.) 
widths, and (f) root mean squared errors of predictions. 
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237 parable to BS, with the other models performing worse than DGC. We note 

238 that DGC values in Tables [I] and |2] are based on M = 1 simulation run for 

239 each of S — 100 sample realizations. Increased values of M (e.g. M = 100) 

240 only marginally improved the validation results. 

241 To account for more realistic patterns of missing data in remote sensing, 

242 e.g. due to cloud cover, we investigate a sample realization in which a solid 

243 block of data (rectangle of 16 x 8 pixels) is missing (see Fig. [2]). The block 

244 is deliberately chosen to include a small area with extreme values to test the 

245 ability of DGC to predict a "hotspot" . For this study, DCG performs better 

246 than the other methods. To compensate for using one sample (S = 1), we 

247 run M = 100 simulations. Therefore the DGC CPU time is considerably 

248 higher compared to the (a) and (b) cases (M = 1). 

249 The dramatic increase of the OK CPU time is caused by the augmented 

250 search neighborhood necessary to span the missing data gap and the cubic 

251 dependence of kriging on the number of points in the search neighborhood. 

252 Generally, the DGC CPU time is proportional to N c and increases with p, 

253 reflecting the increased dimension of the configuration space and number 

254 of variables involved in the optimization. For p = 66% the optimization 

255 involves approximately 10 6 Monte Carlo steps. As shown in Fig. [TJ multiple 

256 simulation runs allow estimating the interpolation uncertainty, with respect 

257 to the subspace of configurations that correspond to local minima of the 

258 objective function (J3]). 
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Figure 2: Interpolation results for synthetic Gaussian data with anisotropic Matern co- 
variance. Sample data are generated by removal of a block of data in the area marked by 
the dashed rectangle. The DGC results are based on M = 100 simulation runs. Sub-figures 
show data interpolated by (a) OK and (d) DGC, variance by (b) OK and (e) DGC, and 
absolute errors by (c) OK and (f) DGC. 



259 Real Data 

260 J^.2.1. Radioactive Potassium Concentration 

261 The first real data set represents soil concentration of radio active potas- 



262 sium measured by gamma-ray spectrometry over part of Canada (jAnonymous 



263 120081 ) . on a grid with No = 256 x 256 nodes extending in latitude from 56S to 

264 57N and in longitude from — 100W to — 98E, with a resolution of 250 m. The 

265 data have been preprocessed to correct for background and airplane flight 

266 height. The potassium concentrations are in units of % and their summary 

267 statistics £1X6 cLS follows: Nq = 65536, 

^min 

0.39, 3.26, z = 1.60, 

268 z so = 1.61, a z = 0.52, skewness coefficient equal to 0.10, and kurtosis coef- 
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Table 2: Interpolation validation measures for Gaussian data with anisotropic Matern 
covariance, using (a,b) S = 100 samples generated by 33% and 66% random thinning, 
respectively, and (c) S = 1 sample generated by removal of a solid block of data. The 
DGC uses N c = 1000 and the results are based on M = 1 simulation run in (a) and (b) 
and M = 100 simulation runs in (c). 





MAAE 


MARE [%] 


MAARE [%] 


MRASE 


MR [%] 




{Tcpu 






(a) 


(b) 


(c) 


(a) 


(b) 


(c) 


(a) 


(b) 


(c) 


(a) 


(b) 


(c) 


(a) 


(b) 


(c) 


(a) 


(b) 


(c) 


DGC 


0.17 


0.50 


1.28 


-0.01 


-0.07 


1.70 


0.35 


1.04 


2.45 


0.33 


0.83 


1.71 


99.93 


99.56 


98.84 


3.16 


8.78 


166 


NN 


2.08 


2.09 


5.01 


-0.19 


-0.17 


8.12 


4.25 


4.26 


9.77 


2.65 


2.73 


6.24 


95.61 


95.35 


75.52 


0.04 


0.02 


0.08 


BL 


0.63 


1.01 


4.95 


-0.12 


-0.21 


6.49 


1.29 


2.10 


9.41 


0.91 


1.51 


6.07 


95.59 


95.30 


75.52 


0.04 


0.02 


0.06 


BC 


0.43 


0.78 


4.92 


-0.06 


-0.12 


6.82 


0.89 


1.61 


9.32 


0.65 


1.21 


6.09 


99.74 


99.09 


79.00 


0.04 


0.02 


0.06 


BS 


0.32 


0.55 


5.09 


-0.02 


-0.06 


9.19 


0.65 


1.13 


9.61 


0.41 


0.78 


6.30 


99.90 


99.62 


83.45 


2.06 


0.57 


0.49 


ID 


1.04 


1.44 


5.14 


-0.29 


-0.33 


7.33 


2.15 


2.96 


9.85 


1.34 


1.91 


6.22 


99.09 


97.84 


77.06 


0.16 


0.17 


0.06 


OK 


0.06 


0.31 


2.49 


-1E-5 


-0.01 


4.66 


0.12 


0.65 


4.71 


0.12 


0.49 


3.19 


99.99 


99.85 


96.75 


31.1 


9.15 


2546 



ficient equal to 2.45. A plot of the data in Fig. 3(a) displays clear signs of 
anisotropy. Samples Z(G S ) were generated from the original data by random 
thinning with p = 33% and 66%. 

Classification (N c = 8, 16) and interpolation (N c = 1500) results for 
p = 33% are listed in Table |3j The prediction performance of DGC is su- 
perior to other models except for the BS. The outstanding performance of 
the latter is likely due to the smooth spatial variation of the radioactivity 
data. An example of prediction results based on M = 100 simulation runs 
for one sample realization with p = 66% is shown in Fig. [3J The DGC clas- 
sification CPU time with N c = 8, 16 was 2.6 and 3.3 seconds respectively. 
The computational time for DGC interpolation is comparable to that of BS, 
but one order higher than NN, BL, and BC times. The limiting factor in 
DGC are the MC simulations that involve up to ~ 10 7 Monte Carlo steps 



(see Fig. 4(a)). The histogram in Fig. 4(b) gives the distribution of the ob- 
jective function residuals for 100 accepted configurations and verifies that all 
of them correspond to small (< 2 x 10~ 4 ) values. 
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Figure 3: DGC interpolation of the radioactivity data obtained from M = 100 simulation 
runs using one sample set generated by 66% thinning. The plots are as those in Fig. Q] 



Table 3: Classification (N c — 8, 16) and interpolation (N c = 1500) results for the radioac- 
tivity data with p = 33% thinning. DGC is compared with k-nearest neighbor (KNN) and 
fuzzy k-nearest neighbor (FKNN) models for classifications, and with the nearest neigh- 
bor (NN), bilinear (BL), bicubic (BC), biharmonic spline (BS) and inverse distance (ID) 
models for interpolation. 





Classification 






Interpolat 


ion 










N c = 8 


iV c = 16 






N c = 1500 








Model 


(F*) S F * 


(F*) S F . 


MAAE MARE [% 


MAARE [%] 


MRASE MR [%] 


{Fcpu ) 


DGC 


4.01 0.21 


8.64 0.28 


9.4e-4 


-2.2e-3 


6.8e-2 


1.56e- 


-3 


100.00 


39.94 


KNN 


4.96 0.16 


11.44 0.22 
















FKNN 


4.22 0.16 


10.14 0.23 
















NN 






2.3e-2 


-0.165 


1.64 


3.5e- 


-2 


99.78 


1.41 


BL 






3.7e-3 


-5.9e-2 


0.27 


5.9e- 


-3 


99.78 


1.40 


BC 






1.7e-3 


-2.2e-2 


0.12 


2.9e- 


-3 


100.00 


1.46 


BS 






4.7e-4 


-1.2e-3 


3.4e-2 


7.6e- 


-4 


100.00 


32.82 


ID 






1.3e-2 


-0.160 


0.90 


1.7e- 


-2 


99.95 


183.64 
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(a) Objective function evolution 
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Figure 4: Left: Evolution of the objective (cost) function ([3]) for potassium concentration 
versus the number of Monte Carlo steps. Inset focuses on the convergence to the optimum. 
Right: Histogram of the objective function residual values obtained from 100 different runs. 



4-2.2. Ozone Layer Thickness 

The second real-world data set represents d aily c olumn ozone measure- 



287 ments on June 1, 2007 (lAcker and Leptoukhl . 120071 ). The data are on a 



288 1° x 1° grid with Nq = 180 x 360 nodes extending in latitude from 90S 

289 to 90N and in longitude from 180W to 180E. The data set includes natu- 

290 rally missing (and therefore unknown) values. The data are in Dobson units 

291 with the following summary statistics: N = 48501, z min = 158, z max = 596, 

292 z = 311.58, z .5u = 308, o z = 46.05, skewness coefficient equal to 0.31, and 

293 kurtosis coefficient equal to 2.30. The gaps are mainly due to limited cov- 

294 erage on the particular day, generating conspicuous stripes of missing values 

295 in the south-north direction. Since the true values at these locations are 

296 not known, validation measures are not evaluated. Instead, the interpolation 

297 quality is assessed empirically, based on the visual continuity between the 
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298 observed data and the predictions. 



In the DGC reconstructed image, as shown in Fig. 5(b) , some traces of 



300 the stripe pattern due to overestimation in low-value areas (averaging effect) 

301 can still be observed. However, this effect is somewhat less pronounced than 

302 in other interpolation methods, presented in Fig. O Indeed, histograms of 

303 the predicted values, see Fig. [3, show a larger proportion of DGC predictions 

304 in bins below the sample average z = 311.58 compared to the other methods. 
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X 
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X 

(b) DGC interpolation 
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Figure 5: DGC interpolation results for ozone data, obtained from M = 100 simulation 
runs on one set of the original data with missing values: Original data (a), interpolated 
data based on the median values from M runs (b), and spatial distribution of the 95% 
confidence interval (c.i.) widths (c). 
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Figure 6: Interpolation results for ozone data, using the nearest-neighbor (NN) (a), the 
inverse distance (ID) (b), and the biharmonic spline (BS) (c) methods. 
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Figure 7: Histograms of the ozone values predicted by the respective methods: DGC, NN, 
ID, and BS. 



5. Discussion and Conclusions 



We presented and investigated the DGC method for the prediction of 
missing data on rectangular grids. DGC is based on stochastic simulation 
conditioned by sample data with a global objective function that accounts 
for anisotropic correlations. The constraints involve normalized directional 
gradient and curvature energies in specified directions. The simulation sam- 
ples the configuration subspace that leads to local minima of the objective 
function. 

For reliable application of DGC sufficiently high sampling density and 
number of data for the calculation of sample constraints are desirable. We 
evaluated the average numbers n p of nearest-neighbor sampling-point pairs 
per direction and n t of compact triplets of sampling points per direction. The 
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first, n p , is equal to the number of terms involved in G n (l s ), while the second, 
n t , is the number of terms in C n (l s ). For uniform random thinning n p and n t 
depend only on the degree of thinning p and the domain size L. For p = 66% 
we obtained (n p , n t ) = (280, 92) for L = 50 and (n p , n t ) = (7523, 2542) for 
L = 256 without significant differences between different directions. These 
values are sufficient for reliable estimates of G n (I s ) and C n (I s ). However, 
smaller grids or higher thinning degrees can lead to insufficient sampling. 

Regarding sensitivity of DGC to noise, we have run tests on simulated 
random field realizations to which Gaussian white noise is added. DGC seems 
more sensitive to noise than other interpolation methods (e.g., BL, BC, BS), 
resulting in a larger increase of cross-validation errors with increasing noise 
variance. This effect is caused by the fact that methods like BL, BC, and BS 
perform some smoothing of the noise by means of the weighted average over 
extended neighbors. DGC on the other hand focuses on correlations over a 
small local neighborhood, which are sensitive to noise. Hence, in its current 
formulation DGC is more useful for smooth data distributions, such as the 
ones studied herein. For noisy data, improvements can be made by developing 



directional kernel-based estimators 



the spirit of (jElogne fc Hristopulos 



'or the square gradient and curvat ure in 



2008 



Hristopulos &; El ognc 



336 incor porating an initial filtering stage to reduce noise ( 



1996|). 



Brownrigg 



20091) or by 



1984 



Yinl . 



DGC does not rely on assumptions about the probability distribution of 
the data, it is reasonably efficient computationally, and it requires very little 
user input (i.e., the number of simulation runs, the number of class levels and 
the size of the maximum stencil for initial state selection). Potential applica- 
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tions include filling of data gaps in satellite images and restoration of dam- 
aged digital records. For applications in the interpolation of data sampled on 
an irregular grid, DGC needs to be extended to account for the lack of grid 
structure. This can be accomplished using kernel functions with adjustable 



346 band width as shown in ( lElogne fc Hristopulos 



2009|). 



2008 



Hristopulos fc Elognd . 



DGC shares conceptual similarities with interpolation methods based on 
splines, which are generated by minimizing an objective function formed 
by th e linear comb ination of the square gradient and the square curva- 



35i ture (IWessel 



20091 ). A special case of the splines-based approach is the 



BS method used above for comparison purposes. DGC does not require min- 
imization of the square gradient and curvature but requires matching the 
sample and entire-grid values of these constraints. Splines-based methods 
require solving a linear system involving the Green's function of the interpo- 
lation operator; the numerical complexity of this calculation scales with the 
third power of the system size. DGC does not require such a costly operation, 
because the objective functional is defined using local couplings. Finally, in 
contrast with splines interpolation, DGC introduces a stochastic element by 
sampling the configuration subspace of local minima of the DGC objective 
function. On the other hand, splines-based methods can handle irregularly 
spaced samples, while DGC is currently restricted to grid data. 
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